Des Plaines
Measuring and Reducing LLM Hallucination without Gold-Standard Answers via Expertise-Weighting
Wei, Jiaheng, Yao, Yuanshun, Ton, Jean-Francois, Guo, Hongyi, Estornell, Andrew, Liu, Yang
LLM is known to provide factually inaccurate information that appears to be confident, i.e. hallucination. It is currently a major obstacle to the reliability and trustworthiness of LLM [13, 34, 21]. An essential step towards solving this problem is measuring hallucinations. However, this is challenging from a data perspective as existing metrics presume that benchmark datasets posses gold-standard answers, i.e. "best" or "correct" answers written by humans [16]. The requirement of such answers imposes two fundamental limitations on hallucination measurement: 1) hiring human annotators to produce gold-standard answers is costly in both time and money [4, 43, 38]; 2) gold-standard answers are prone to natural human errors [7, 6, 49]. To this end, we take a step forward and propose a framework which measures the LLM hallucinations without the requirement of gold-standard answers. Our framework is partially inspired by the literature on learning with noisy labels [23, 18, 19], where there are no ground-truth labels for verifying the quality of imperfect human annotations [43, 38, 20], detecting annotation errors [48, 26, 47], or training models robustly [42, 3, 17, 36, 39]. Our basic idea is simple: leveraging off-the-shelf and high-quality LLMs to generate answers that serve as a proxy for gold-standard answers. The primary challenge in such an approach is how to properly weigh the expertise of each LLM for a given question x, without a priori knowledge of the true (i.e.
- North America > Canada > British Columbia (0.04)
- Europe > France (0.04)
- North America > United States > Illinois > Cook County > Des Plaines (0.04)
- (5 more...)
- Leisure & Entertainment > Sports > Football (1.00)
- Media > Television (0.68)
- North America > United States > California > San Mateo County > Menlo Park (0.19)
- North America > United States > Virginia > Fairfax County > Reston (0.05)
- North America > United States > Oregon > Benton County > Corvallis (0.05)
- (3 more...)
Applied AI News
Blue Cross/Blue Shield of Virginia AT&T's Merrimack Valley Works The US Army Laboratory Command's (Richmond, VA) has developed an (North Andover, MA) has developed Human Engineering Laboratory expert system to classify, evaluate the Expert Capacity and Material (Aberdeen Proving Ground, MD) has and process medical claims. The system, System (XCAM), an expert system awarded a $2.4 million contract to called MedScreen, reportedly which simplifies forecast evaluations Carnegie Group (Pittsburgh, PA) to can process up to 500 claims in 45 for a manufacturing operation The continue work on a knowledge-based minutes, an operation that used to system automates the analysis of logistics planning system. The system take several days to complete. The IBM (Armonk, NY) and Dragon Systems NRM has been successfully deployed ICL (Birmingham, England) has completed (Newton, MA) have jointly in a number of Australian banks, as a pilot test of an intelligent developed VoiceType, a speech recognition well as a food storage and distribution system for field service diagnosing system based on elements of center. ICL used a laptop-based allows hands-free typing.
- North America > United States > Virginia > Richmond (0.25)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.25)
- North America > United States > Massachusetts > Middlesex County > Newton (0.25)
- (14 more...)
- Health & Medicine (1.00)
- Banking & Finance > Insurance (1.00)
- Government > Regional Government > North America Government > United States Government (0.70)